7 - Machine Learning for Physicists [ID:52681]
50 von 669 angezeigt

Let's get started. Today I want to start with recurrent neural networks. I think they,

Vittoria already started to discuss them last time, but I think not many were there, so

let's recap. So what are recurrent neural networks good for? So the main task that you

want to solve is to look at a time series, to look at a video, to look at a sentence,

which is a sequence of words, and do something with it. For example, you want to classify it,

so instead of classifying an image, you want to classify a sentence and say the person saying

this is sad or is happy, or you want to classify maybe an experimental time series, you want to

say this noisy fluctuating voltage signal indicates that the qubit is in the excited

state. So there are many, many examples of trying to analyze time series. So what you could do,

of course, is to feed the whole time series into the kind of standard neural networks that we've

been looking at. So if the time series would consist of 10,000 time points, then you could

just feed all of them as a gigantic vector of dimension 10,000 into the input layer,

so there would have to be 10,000 input neurons, and then you could process this. But of course,

that doesn't seem very efficient. You could also, we will mention this in a moment, go to

convolutional neural networks, but that's also not completely efficient. And so recurrent neural

networks really take this concept series that you are dealing with a time series, and they really

try to deal with it as a time series. So you use these networks in a way that we'll discuss in a

moment to go from left to right through the sequence. So you start at small times, and then

you process it step by step. So that's the general idea behind time series processing using neural

networks. And in order to be able to do this, such a network has to have a kind of memory,

because you cannot understand or judge the meaning of a word in a sentence without having a memory

of all the words that came before, or even if you have a noisy, fluctuating time trace that is

supposed to tell you what state the qubit is in, then you have to have it all kind of in memory.

You cannot just look at a single time point and then decide based on that. So these are networks

with some kind of memory, and we are going to discuss what this really means in a moment.

Okay, so this is the kind of picture we want to draw. Time runs from left to right, and every

circle here on this axis represents, say, one piece of data. It could be a single number,

or it could also be a vector. And for each time point, let's think of time as discrete,

there is another input. So here we have some input, one time step later, there's another input,

another input, and so on. And so this is the kind of time series you want to process. And we already

know that if we don't like to stack all of this together and feed it all together into a standard

neural network, there's an alternative, which would be convolutional neural networks. So

convolutional neural networks, as you know, they make use of translational invariance.

So translational invariance in this case would mean translational invariance in time, meaning

that things that happen here are conceptually equivalent to things that happen earlier in time.

And then you could apply a convolutional neural network. So you would use the usual setup of

a convolutional neural network. So as we discussed, you apply this kernel, you apply this filter,

you convolve with this filter, this filter has a finite size, and then you would get the output.

So the output at this particular time point would gather the information from these surrounding

time points, and the output at another time point would gather the information around that time point

and so on. So that's processing of a time series using a convolutional neural network, and people

are really using this. So recently I learned, and I wasn't aware of this before, that if you go to

this quite good translation service, which is called DeepL, that's actually a German company,

then they are actually using some version of convolutional neural networks for their

translation tasks. Okay. There is a catch, however, which is obvious in this picture,

namely the output at any given moment in time only depends on a finite time range,

namely the finite time range given by the filter size. So the output for this particular output

neuron only depends on the, in total, three time points that are shown here, and maybe that's not

good enough. Now you can always increase the filter size, of course, but this becomes a bit cumbersome.

So long memories with convolutional neural networks are really challenging, because first we would,

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:17:57 Min

Aufnahmedatum

2024-06-20

Hochgeladen am

2024-06-21 10:39:04

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen